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Abstract — We consider the entropy of sums of independent 
discrete random variables, in analogy with Shannon's Entropy 
Power Inequality, where equality holds for normals. In our case, 
infinite divisibility suggests that equality should hold for Poisson 
variables. We show that some natural analogues of the Entropy 
Power Inequality do not in fact hold, but propose an alternative 
formulation which does always hold. The key to many proofs of 
Shannon's Entropy Power Inequality is the behaviour of entropy 
on scaling of continuous random variables. We believe that 
Renyi's operation of thinning discrete random variables plays a 
similar role to scaling, and give a sharp bound on how the entropy 
of ultra log-concave random variables behaves on thinning. In 
the spirit of the monotonicity results established by Artstein, 
Ball, Barthe and Naor, we prove a stronger version of concavity 
of entropy, which implies a strengthened form of our discrete 
Entropy Power Inequality. 

Keywords: convolution, discrete random variables, entropy. 
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1. Review of previous work 

It is natural to consider the entropy of the sum of inde- 
pendent random variables, for example in proving theoretical 
results concerning the Central Limit Theorem or in practical 
models of information transmission involving addition of noise 
to the signal. 

Pedagogically speaking, the entropy H of discrete random 
variables usually comes first, with the differential entropy h 
of continuous random variables coming later However, results 
from functional analysis imply properties of the differential 
entropy which do not yet have discrete counterparts. For ex- 
ample Shannon 1 1 1 stated Theorem ll.il known as the Entropy 
Power Inequality (EPI), which was later rigorously proved by 
Stam f2l and by Blachman [3 1 using an argument based on the 
heat equation. Write E{t) = i log(27ret) for the entropy of 
a Gaussian random variable with finite variance t, and define 
v{X) = E-^{h{X)) = e2''(-^V(27re) for the entropy power 
of random variable X with differential entropy h{X). (We use 
log to represent the natural logarithm throughout this paper). 

Theorem 1.1 (EPI): For independent continuous X and Y, 
the sum X + Y satisfies 



v{X + Y) >v{X) + v{Y), 



(1) 
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with the only non-trivial case of equality being when X and 
Y are Gaussian. 

A key role is played in many proofs of Theorem 1 1.1 1 bv the 
operation of scaling of continuous random variables, using the 
fact that for any a. 



v{^/aX) ^ av{X). 



(2) 



One major contribution of this paper is Theorem 12.41 below, 
which shows that a one-sided version of (|2]i holds for discrete 
random variables. In this case, the operation of scaling is 
replaced by the thinning operation introduced by Renyi |4|. 

As is implicit in the work of Verdii and Guo [51, Theorem 
11.11 can be rephrased in terms of scalings, in the form of 
Corollary [L2] below. Lieb |6| and Dembo, Cover and Thomas 
|7| prove the Entropy Power Inequality by working with the 
Renyi entropy (a generalization of Shannon's quantity). They 
use properties of p-norms on convolution given by Beckner's 
sharp form J8) of the Young inequality. Using a particular 
parameterization, they show that this Young inequality im- 
plies that the differential entropy is concave with respect to 
normalized linear combinations, that is, for any < a < 1: 



h{y/aX + VT~^Y)>ah{X) + {l-a)h{Y). (3) 

The papers Q, ||6| show that Q is equivalent to the Entropy 
Power Inequality. The form of a used in this proof suggests 
the following result: 

Corollary 1.2: Given independent random variables X and 
Y with finite and non-zero entropy power, there exist X* and 
Y* such that X = ^/^X* and Y = ^/T^Y* for some 
< a < 1, and such that h{X*) = h{Y*). The Enti-opy 
Power Inequality Theorem 1 1.1 1 is equivalent to the fact that 



h{X + Y) > h{X*), 



(4) 



with equality if and only if X and Y are Gaussian. 

Proof: Applying ^ and taking a — v{X) / {v{X)+v{Y)) 
ensures that X* — Xj^fa. and Y* = Yj^X — a have the 
property that v{X*) = v{Y*) = v{X) + v{Y). 

Assume (|4). Since h{X + Y) > h{X*), applying to 
both sides we deduce that v{X + Y) > v{X*\ which equals 
v(X) + v{Y), so that the EPI O holds. 

Assume d). Since w(X + Y) > v{X)+v{Y) = w(Y*), so 
applying E to both sides, we deduce (|4|i. ■ 

It is natural to conjecture that there should be a version of 
the EPI for discrete entropies H. We will show in Theorem |2.5l 
that an equivalent of this rephrased EPI does hold for discrete 
variables, whereas in Section |IV] we show that some other 
apparently natural versions of Theorem 11.11 in fact fail. 
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In the context of sums of independent continuous random 
variables, Artstein, Ball, Barthe and Naor f9l proved a stronger 
type of result, referred to as monotonicity. Alternative proofs 
were later given by Tulino and Verdu ifTOll and by Madiman 
and Barron fTT|. For example. Theorem 2 of 191 gives the 
following: 

Theorem 1.3: Given independent continuous random vari- 
ables Xi with finite variance, for any positive a; such that 



n+l 



nh 



E- 

\i=l 



1, writing a'^' = Si^^j ai = 1 — aj, then 



This is called monotonicity since, choosing a; = l/(n + l), 
it implies that for independent and identically distributed 
Xi, the entropy of the normalized sum h {J^^^i Xi/ ^/n) 
is monotone increasing in n. Equivalently writing d{X) = 
D{X\\(l)xx,(Tj^) for the relative entropy from X to a normal 
of the same mean and variance, the relative entropy of the 
normalized sum d{J2"^i Xi/y/n) is monotone decreasing in 
n. 

The other major contribution of this paper is Theorem 
13.21 which establishes a discrete analogue of Theorem 11.31 
Such monotonicity results as Theorem 11.31 implv strengthened 
Entropy Power Inequalities. By choosing 



v(0 = 



(5) 



(in the case that all a*^'^ < 1; if not, the result is automatic) 
Artstein et al. [9J showed that their Theorem 11.31 implies the 
following extension of the EPI, Theorem ll.il 

Theorem 1.4: Given independent continuous random vari- 
ables Yi with finite variance, the entropy powers satisfy 

/ri+l \ n+l 



i=i 



We observe that this strengthened EPI, Theorem 11.41 can 
be expressed in a similar way to Corollary 11.21 That is, given 
independent random variables Yi, if there exist such that 



= 1 and Y* = Yi/^Jal have entropies such that 



^n+l 





h* are constant in j, then 



> h* 



(6) 



This again follows by observing that for each j, (|2|l implies 
that v{^,-^_^-Yi) = v*a'^^^ = e^^' a^^^ / {2Tie), so summing 
over j, the RHS of Theorem 1 1.41 is equal to e^'* n/(27re), and 
the result follows. Note that in this case, the choice of a^'^ 
again coincides with that given by Q. In Theorem 13.31 we 
prove a discrete version of this result. 

The structure of the remainder of the paper is as follows. In 
Section we introduce the thinning operation, and describe 
the resulting analogues of the EPI, Theorem ll.il In Section [III] 
we show how these results can extend to provide monotonicity 



results. In Section |IV] we discuss two natural versions of the 
EPI which are not true. In the self-contained Appendices |A] 
and |B] we prove the two main results of the paper, namely 
the scaling result Theorem 12.41 and the monotonicity result 
Theorem 13.21 Although these results are related, they are 
proved independently, the first using a semigroup argument 
similar to that in llT2l and the second using an examination of 
certain Hessian terms, and previous results from |13|. 

There has been considerable interest in proving an Entropy 
Power Inequality for discrete random variables. Some authors 
lfT4l . fTSl, fTSl, f\n\ have focused on replacing the operation 
of integer addition + by modulo 2 addition 0, and obtained 
similar results in that case. As in |18|, we prefer to retain 
+ as integer addition. Harremoes and Vignat |T9l proved 
that ([T]i holds when X and Y are any independent binomial 
Bin(nx,l/2) and Bin(riy,l/2) random variables, on re- 
defining v{X) = e^^^-^^ / {2T:e) (simply replacing differential 
entropies h by discrete entropies H). We prefer to conjecture 
that the discrete version of the Entropy Power Inequality 
should be expressed differently, using the entropy of the 
Poisson distribution. 

II. Entropy and thinning 

Recent work of Harremoes, Johnson and Kontoyiannis f20l, 
|21 1 shows that, in many senses related to Information Theory, 
the equivalent of scaling continuous random variables by a 
factor of ^Ja is the thinning operation on discrete random 
variables, as introduced by Renyi |4|. 

Definition 2.1: The a-thinned version of random variable 
Y is given by the random sum TaY = Bi, where the 

Si, i?2 ■ ■ • are IID Bernoulli Bcrn(a), all independent of Y . 

We write £{t) = H{]lt), an increasing concave func- 
tion, for the entropy of a Poisson random variable lit of 
mean t, and define an analogue of the entropy power as 
V{X) = £'^{H{X)). Theorem 2.5 of fH proves that Ha^ 
maximises entropy within the class of ultra log-concave (ULC) 
random variables X (see below) of given mean Xx, or that 
V{X) < Xx- We investigate the entropy of sums in the context 
of this restricted ULC class. 

Definition 2.2: The ULC random variables are those whose 
probability mass functions P satisfy 



iP{i)^ >{i + l)P(i + l)P{i - 1), 



for all i > 1. 



The ULC class includes the Poisson family and Bernoulli 
sums. This class was introduced in combinatorics [: 22l . l23]| . a 
context in which the Bernoulli random variables are a natural 
fundamental building block. 

The results outlined in l20l . ETI suggest an equivalence 
between scaling by ^/a and thinning by a. This idea has 
developed with the fact that for discrete random variables, 
a natural equivalent of Q is given by the following Thinned 
Entropy Concavity Inequality, proved by Yu and Johnson in 
1 24 1, extending results in |13|, and now a consequence of the 
more general Theorem 13.21 below. 

Theorem 2.3 (TECI): For independent ULC random vari- 
ables X and Y, for any < a < 1 

H{Ta,X + Ti_„y) > aH{X) + (1 - a)H{Y). (7) 
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For < a < 1, examination of the proof shows that equality 
holds if and only if X and Y are Poisson with the same mean. 

One major contribution of the present paper is the following 
theorem, which shows that for ULC random variables a 
one-sided equivalent of ^ holds. This result is proved in 
AppendixlAl using a semigroup designed to preserve entropy, a 
development of techniques in lfT2l . We refer to this result as the 
Restricted Thinned Entropy Power Inequality (RTEPl), since 
it is a special case of the Thinned Entropy Power Inequality 

Theorem 2.4 (RTEPI): Given any ULC random variable X, 

V{TaX) > aV{X), for any < a < 1. 

In the continuous case, the equivalents of Theorems 12.31 
and 12.41 allowed the full EPI, Theorem 11.11 to be deduced. 
Despite this, in Section |IV] we describe how two apparently 
natural equivalents of the EPI, namely (flJl l and ( fTSl l, in fact 
fail in general. These results are stated as Example 14.11 and 
152] respectively. In Theorem 14.31 we discuss some conditions 
under which these results do hold. 

However we can prove a discrete analogue of the rephrased 
Entropy Power Inequality, Corollary 11. 21 The key operation is 
to invert the thinning operation on X, to create random 
variables X*. This additional restriction means that the result 
holds in less generality than Corollarv 11.21 

Theorem 2.5: Given independent ULC random variables X 
and Y, suppose there exist X* and Y* such that X = TaX* 
and Y — Ti^aY* for some < a < 1, and such that 
H{X*) = H{Y*). Then 



H{X + Y) > H{X*), 



(8) 



with equality if and only if X and Y are Poisson. 

Proof: In analogy with the proof of Corollarv 1 1.21 for any 
a we define X* and Y* (if such random variables exist) such 
that X = r„X* and Y = Ti^^Y*. The Thinned Enti'opy 
Concavity Inequality, Theorem 12.31 implies that 



H{X + Y) 



> 



H{T^Xl 
aH{X*J 



(1- 



Y*\ 

a)H{Y:). 



(9) 



This bound will hold for any a, so choosing a such that 
H{X*) = H{Y*), we deduce the result. ■ 
Unlike the continuous case, in general we cannot prove that 
this is the right choice of a, by optimizing (|9]l. However, 
we can give a related bound which we optimize, giving an 
alternative heuristic as to the right value of a to choose. That 
is, by Theorem 12.41 we deduce that 



aH{Xl) + {l-a)H{Y:) 

Because 8{-) is concave, the RHS of (fTOl l is maximized by 
a^V{X)/{V{X) + V{Y)). 

Note that it is not always possible to find X* and Y* as 
required in Theorem |2.5l For example, for X ^ Bern(p), there 
only exists X* such that X — T^X* when a > p. In general, 
for any random variable X with support on {0, . . . , L}, there 
does not exist X* such that X = TaX* for a < EX/L (since 



thinning preserves the support, then L > EX* = EX /a). 
Such an X* will exist for all a when X lies in certain 
parametric families, including the geometric and Poisson, 
since these are preserved by thinning (see ll2n ). 

Some examples illustrate the bounds of Theorem 12.51 

Example 2.6: Using Theorem 12.51 

1) Given X - Ha and F ~ H^, take X* ^ Y* ^ H^+a 
and a — X/{X + n), to confirm that equality does indeed 
hold in dill in this case. 

2) Given X ^ Bin(ri,p) and Y ^ Bin(n, q), if p + q < 1 
then choosing X* ^ Y* ^ Bm{n,p + q) and a — 
p/{p + q), we deduce that 

H{Bm{n,p) +Bm{n,q)) > H{Bm{n,p + q)). (11) 

By results in Poisson approximation, we expect that this 
inequality will be tightest for n large and p, q small. This 
result (fTTT i also follows from Theorem 1 of Shepp and 
Olkin lf25l . which states that if vector p majorizes q 
then H{Bp) < H{Bq), where Bp is the Bernoulli sum 
Y.i=i Bern(pi). Vector {p+q,p+q, . ■ .,p+q, 0, 0, ... 0) 
majorizes vector {p,p, . . . ,p,q,q, . . .q). 

3) Given any identically distributed ULC random variables 
X and Y, choosing a = 1/2, we deduce that if there 
exists X* such that X = T1/2X* then 

H{X + Y) > H{X*). 

Note that such an X* does not exist for the random 
variables in Example 14.11 which may be relevant to the 
fact that these provide a counterexample to (fT3] l. 

III. MONOTONICITY RESULTS 

The other major contribution of this paper is to establish 
a monotonicity result in Theorem 13.21 which we regard as a 
discrete analogue of Artstein et al.'s Theorem 11.31 

In |fT3l . corresponding monotonicity results were proved 
regarding the entropy and relative entropy of sums of thinned 
random variables, a situation in which the two types of 
monotonicity are not equivalent. Write D{X) = _D(X||nA^) 
for the relative entropy between a random variable X with 
mean Ax and a Poisson with the same mean. Theorems 2 
and 3 respectively of |13| showed that for independent and 
identically distributed X,: 

1) the relative entropy D TijnXi) is monotone de- 
creasing in n, 

2) for ULC Xi the entropy H (X]r=i ^i/n^i) is monotone 
increasing in n. 



In the spirit of Theorem 11.31 we will place these results from 
1 13 1 in a context where they can be deduced from more general 
results. Lemma 13.11 and Theorem 13.21 As a consequence we 
give a proof of monotonicity of entropy which uses distinct 
ideas from the convex ordering techniques used in ifTJl . The 
monotonicity of relative entropy is in fact implied by a stronger 
result which is implicit in |13|. 

Lemma 3.1: Given positive Ui such that Y^^=i = ^nd 
writing a^'^ ~ '^i^i Q^i = 1 — 0:1, then for any independent 
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\ n+1 / 

\i=i / 1=1 yi^'i 

Proof: Theorem 5 of |fT3]| shows that for independent 
random variables Yi, 

/n+1 \ n+1 / 

and Lemma 1 of ^ shows that D{TaX) < aD{X). 
Combining these two results we deduce that 



/n+1 



n+1 



nD J2 ^ E ^ E 

\i=l / 1=1 



n+1 



E^ T'q(0 T^i/a(')Xi 



1=1 
n+1 



1=1 

and the result follows. ■ 
We have to work harder to show that Theorem 13.21 the 

corresponding result in terms of entropy, holds as well. The 

proof of this result is given in Appendix IB] 

Theorem 3.2: Given positive at such that J27=i ~ 1' 

and writing a^'^ = Q^i' '^hen for any independent ULC 



/n+1 



nH 



n+1 

1=1 



(12) 



This result gives further support to the 'general conjecture' of 
Gnedenko and Korolev ||26l Pages 211-2] that 'the universal 
principle of non-decrease of uncertainty manifests itself in 
probability in the form of limit theorems when the limit is 
taken with respect to infinitely increasing number of "atomic" 
random variables involved in a model'. In particular Gnedenko 
and Korolev 1,26. Page 215] suggest that it is an important 
problem to 'give information proofs of limit theorems ... on 
convergence of random sums'. We believe that the fact that 
thinning is an operation defined via random summation means 
that Theorem l3.2l represents progress in the direction proposed 
by these authors. 

Note that Theorem 13.21 is a strengthened form of Theorem 
12.31 indeed Theorem 12. 3 1 can be deduced from it by successive 
deletion of terms. 

Just as Theorem 12.31 led to a proof of the rephrased 
Entropy Power Inequality Theorem 12.51 Theorem 13.21 leads 
to a strengthened version of Theorem 12.51 analogous to (|6]l 

Theorem 3.3: Assume there exist Y* and such that Y^ = 
TaY* for each i, and there exists some constant H* so that 
the entropies satisfy H{Y,^^JT^,/aU)Y*) = H* for all j. 
Then 




Proof: Theorem 13.21 implies that 

/n+1 \ /n+1 \ 



n+1 / 

1=1 yi^'i 

= nH*, 

giving a discrete version of the rephrased strengthened Entropy 
Power Inequality, ■ 

IV. Two NATURAL DISCRETE EPIS FAIL 

Since the Poisson distribution shares with the Gaussian the 
property of infinite divisibility, as in flST] one natural analogue 
of Theorem 11.11 comes from replacing v by V, with equality 
holding if and only if X and Y are Poisson. However, as a 
counterexample provided by an anonymous referee previously 
showed, such a result turns out not to be true. 

Example 4.1: For independent discrete random variables X 
and Y, it is not always the case that 



V{X + Y) > V{X) + V{Y). 



(13) 



A counterexample is that X ^ Y, Px{0) = 1/6, Px(l) = 
2/3, Px(2) = 1/6. Notice that these X and Y are the sum 
of Bernoulli random variables, and thus restriction of X and 
Y to the ULC class does not help. 

^ shows that an equivalent form of the EPI Theorem 11.11 
is that for any < a < 1, 



v{V^X + VT^Y) > av{X) + (1 - a)v{Y). 



(14) 



(see ||7|). In analogy with this, we might make another 
conjecture, which again turns out to not hold. 

Example 4.2: A natural conjecture, which we refer to as 
the Thinned Entropy Power Inequality, is that for independent 
discrete ULC random variables X and Y, for any < a < 1, 

V(TaX + Ti^c^Y) > aV{X) + (1 - a)V(Y), (15) 

with equality for < a < 1 if and only if X and Y are 
Poisson. 

However, taking X ^ Bcrn(l/3) + Hi and Y ^ IIiooo 
and a = 0.999, ( fTSI l is false. That is (taking all logs to 
base 2) H{X) = 2.08286..., and V{X) = 1.27189.... 
Clearly V{Y) = 1000. Hence the RHS aV{X) + (1 - 
a)V{Y) = 2.27062 .... Then TaX + Ti_„y - Bem(a/3) + 
n„+(i-a)iooo, with H{Ta,X + T^.^Y) = 2.55729 . . ., and 
V{TaX + Ti_„y) = 2.25374 .... In this case ^ fails. 

Notice that ( fTSI ) fails even in the restricted case where Y 
is Poisson, a case where we might hope that even stronger 
results might hold, in analogy with work of Costa ||27]| . The 
same is true of the conjecture ( fT3] l - if that result held for Y 
Poisson, then using Theorem 12.41 would imply that ( fTSb held 
in the same case. 

As previously described, in the continuous case |7| proves 
^ is equivalent to the Entropy Power Inequality. The key 
fact in this proof is the scaling result, (|2]). Since Theorem 
12.41 is a one-sided version of this fact, we combine it with 
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Theorem |2.3| to obtain the following partial results, which were 
proved as Proposition 2 and Corollary 2 respectively of f24], 
conditionally on the then unproved Theorem 12.41 so now hold 
without qualification. 

Theorem 4.3: Consider independent ULC random variables 
X and Y. 

1) For any /3, 7 such that < < ^ (note that 
in this case /3 + 7 < 1 unless V{X) = V{Y)), then 

V{TpX + T^Y) > (3V{X) + -fV{Y). 

2) If y - n^, with /i < V{X), then for all < a < 1, 

V{TaX + Ti-oX) > aV{X) + (1 - a)V{Y). 

We conjecture that there exist some a_ = a^{X,Y) and 
a+ = q;+ {X, Y) (perhaps defined in terms of the means and 
entropies of X and Y) such that for < a < «+, ( fTsT i holds. 
However, as Example 14.21 shows, the unrestricted version of 
this equation fails. 

It is worth noticing that the condition on (3 and 7 in 
Theorem 1431 1) can be restated as PV{X) + (1 - 7)1^(1") < 
min(V^(X), V{Y)). Hence by assuming a weaker bound, this 
theorem proves a stronger one. 

Appendix A 
Proof of RTEPI Theorem[211 



We prove the Restricted Thinned Entropy Power Inequality, 
Theorem 12.41 using a quantity L{X) that plays a role analo- 
gous to the Fisher information in the work of Blachman [3] 
and Stam 

Definition A.l: For a random variable X with probability 
mass function P, define the quantity 



L{X) 



^(z + l)P(z + l)log 

z=0 



P{z + \) 



We develop the argument in IIT2I . where we adapted random 
variables by thinning and then adding an independent Poisson 
random variable: 

Definition A.2: For a positive function /(a), define the 
combined map Ua.f(a) that thins and then adds an independent 
Poisson random variable: 

UaJ{a)X = TaX + 11/(0,) ■ 

For most of this section, we assume that the random variable 
X has finite support. 

Proposition A. 3: Consider a continuously differentiable 
function / with /(I) — 0. Assume either (a) f{t) = for 
all t or (b) f{t) > for t < 1. Given ULC X with finite 
support, writing Xt = Utj{t)X and Pt{z) — P(^t — z), then 
for any < t < 1 



%,x,, = «?^>-.,oE^.w'°e- ''''' 

z=0 



dt 



t 



Pt{z + iY 



where r{t) = f{t)/t ~ f'{t). Equivalently, f{t) ^ tf{l) 



Proof: From Equation (8) of lfT2ll . we know that the mass 
function of Xf satisfies 

|p.(.) = A.( '- + '7- + " -,-(0P,(.)). (16) 

where adjoint operators A and A* are defined by A*g{x) — 
g{x—l)~g{x) and Ag{x) — g{x + l) — g{x). Then we simply 
differentiate the entropy, using ( fT6] l to obtain 



2 = 

{z + l)Pt{z + l) 



dt 



t 



E 

2 = 



{z + l)Pt{z + l) 



-r{t)Pt{z) lo. 



r{t)Pt{z) j log Pt{z) 

Pt{z) 
Pt{z + l) 



and the result follows, where this final step uses Fubini's 
theorem. 

The differentiation of the infinite series at t can be justified 
in the case (a) since then the sum is simply a finite one. In 
case (b) it can be justified by a result (see |28|) concerning 
P^{^) = 'n,^=o'^s{z) with a < s < b. The derivative ^ — 
Y1T=Q Ws'^siz) for a < s < h, assuming that -^Us{z) exist, 
and are uniformly bounded as |t^Ms(z)| < M[z), for all a < 
s < b, where X]^o -^(^) < "O- 

Given a particular < i < 1, we can choose a < t < b such 
that this result holds. In this case, writing A = EX, Equation 
(9) of ini shows that P(T,X 0) > (1 - s)^, so that 



Psiz) > viTsX = o)p(n/(,) = z) > (1 - sY 



hence for a < s < b, for all z, we can bound 



(17) 



I - log (2) I < -Alog(l - s) + f{s) + z\ log/(s)| +logz!. 

Sincce /(s) is continuous and bounded away from zero on 
(a, 6), Stirling's formula means that this can be uniformly 
bounded by Ci + C2Z^, where Ci and C2 depend on a and b. 
Similarly, the triangle inequality means that 



dPs, , 



< 



zPsiz) 



\ris)\Ps{z - I) 



{z + l)P,{z + l) 



+ Hs)\PAz), 



so the fact that X, and hence Xg, is ULC means that 
Ps{z) < {Ps{l)/Ps{0)Y/zlPs{0)- Hence, since ^ means 
that the ratio P5(l)/Pj(0) is uniformly bounded on (a, b), the 
result follows by continuity (and hence boundedness) of r{t). 

Note that although this result is stated for ULC X with 
finite support, it should hold for any random variables such 
that the differentiation step can be justified. ■ 
Writing J{t) = £'{t) = E^ont(^) \og{{z+l)/t) (a positive 
function), we state the following isoperimetric inequality, 
equivalent to the RTEPI Theorem 12.41 a technique suggested 
by ifTSl . This result may be of independent interest. 
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Theorem A.4: For all ULC random variables X with finite 
support, 

L{X) < V{X)J{V{X)). 

Lemma A.5: For random variables X with finite support. 
Theorems 12.41 and IA.4I are equivalent. 

Proof: Write g{a) for V{TaX). Assume Theorem l24l 
holds, so that 17(0;) > ag{l) or, rearranging, that for a < 1 

- .9(1) 



1 



<5(1), 



(the change of direction of the inequality comes since a < 1). 
Letting a — > 1, we see that the RTEPI implies that ^'(l) < 
<?(1). 

The key is to observe that using Proposition IA.3I the 
derivative of H{TaX) with respect to a is L{TaX)/a. This 
means that by the chain rule the derivative 

g'{a) = iH{T^X))^^^ 

a 

aJ{£-^H{T^X))) 

LjT^X) 
aJ{V{T^X)y 



(18) 



so taking a — 1, the result ^'(l) < g{l) becomes Theorem 

EH 

We deduce the reverse implication by using ( fTsT l. and 
applying Theorem lA.4l to the random variable TaX, to deduce 
that 



L{T^X) 



< 



V{T^X) _ g{a) 



aJ{V{T^X)) 

This implies that g{a)/a is decreasing in a, which means that 
g{a)/a > g(l)/l, which is Theorem 12.41 ■ 
We prove Theorem I A.4I next, and hence deduce that Theorem 
12.41 holds by Lemma IA.5I Our approach involves the map 
Uaj{a) which preserves the entropy (as opposed to preserving 
the mean as in 1 12|). 

Proof of Theorem lA?} Since L{X) = ||(T„X)|„^i, 
we know that L{X) need not always be positive (consider for 
example X ^ Bern(p) with p > 1/2). However, note that if 
L{X) < 0, then automatically L{X) < < V{X)J{V{X)), 
as required. Hence, we can restrict our interest to the case 
where L{X) > 0. 

Now, H{TaX) is a positive concave function of a which 
(since by |12| it is upper bounded by the entropy of a IIqAx 
random variable) tends to zero as a tends to zero. Hence, 
H{TaX) can only be decreasing in a for a E (a*,!], for 
some a* > 0. Hence, if L{X) > 0, then L{TaX) > for all 
a E [0, 1] and H{TaX) is a increasing function of a for all 
a E [0,1]. Hence, it is possible to perform an interpolation 
argument - that is, we can find f{t) > such that Xt — 
Ut f{t)X has constant entropy. We write At for the mean of 
Xt. 

This means that, since the semigroup interpolates between 
Xi ^ X and Xq ^ Hy , a Poisson random variable with mean 
A', we can deduce that 

H{X) = H{X,) = H{Xo) = H{Uy) = £{X'), 



or that A' = V{X). 

Motivated by Proposition IA.3I we consider properties of 

r{t) = L(XO/(tEr=o^tWlog(l^))- Note that by 
Chebyshev's rearrangement lemma (see for example Equation 
(1.7) of f29l) 

LiXt) . gP.(.) j log (^^^ 

is the expectation of the product of an increasing and decreas- 
ing function, so L{Xt) < Xt T,T=o Pt (^) log ( p^'i+i) ) ^ oi" 
r{t) < Xt/t. We can write L{Xt) as 



-XtD{P*\\Pt) + Y.{z + l)Pt{z + 1) log ( 



< -D{Pt\\Ii 



At 



+y(z + i)Pt(z + i)iog 



z=0 



z + \ 



(19) 



00 



= H{Xt)-}_^Pt{z + l)\og{z + l)\-Xt 

z=0 

00 



+ ^(z + l)Pt(z + l)log(z + l), 

2 = 



(20) 



where Pf{x) = Pt{x+\){x+\) / Xt is the size-biased version 
of Pt, and ^ follows by Equation (0.6) of Wu fSO). 

Theorem I A.4I will follow if we can prove that this expression 
( l20b . which we shall refer to as U{Xt), is a decreasing function 
of t. That would mean that 

L{X) = L{X^)<U{Xi)<U{Xo) 
= X'J{X')^V{X)J{V{X)). 

In fact, since H{Xt) is constant, equivalently, we will prove 
that U{Xt) — H{Xt) is a decreasing function of t. 

Case A: r{t) > for all t. We simply differentiate ( |20] |. 
using Equation (fT6b . and express 



dUjXt) 
dt 



iz + 2)Pt{z + 2) \ z + 2 

-r{t) (z + l)log 



tPt{z + l) 



z + l 
(21) 



The term-by-term differentiation can be justified as before, 
since the assumption that r{t) = —{f{t)/ty > implies that 
f{t) > for t < 1, so the assumptions of Proposition IA.3I 
hold. Hence the entropy can indeed be differentiated, and the 
functions logz! and zlogz can be controlled using a similar 
argument. Since —(z + l) log + 1 > 0, Equation (I2TI 1 is 
increased on replacing r{t) by the (larger) value Xt/t, so we 



deduce that 



au(Xt) 

dt 



is less than or equal to 



2—0 \ V / / 

(22) 

Observe that ( |22] | is the covariance of decreasing and in- 
creasing functions, and hence is negative by the Chebyshev 
rearrangement lemma. We have shown that if L{Xt) > for 
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all t, so that r(t) > for all t, then L{Xt) is a decreasing 
function at t. 

Case B: r{t) < for some t. Recall that we need only 
consider the case where L{X) — L{Xi) > 0. Define t* = 
sup{t > : r{t) < 0}. Suppose that t* > 0. For all t > t*, 
r(t) > 0, so that for all t > t*, we know that L{Xt) > 
L{X) > 0. By considering t arbitrarily close to t*, continuity 
of L{Xt) impHes that L{Xt) > for all t € {t* - e, t*). This 
contradicts the assumption that t* > 0, so we deduce that 
r{t) > for all t > 0, and the result follows. ■ 
Proof of Theorem \2.4\ By Lemma IA.5I we deduce from 
Theorem IaTH that the RTEPI, Theorem |Z41 holds for ULC X 
with finite support. 

For general ULC X, let X^'^^ be the random variable X 
truncated at fc, for k = 1,2,.... Then the mass function of 
TaX'^^'^ tends to that of T^X pointwise, for all < a < 
1. Moreover, the mean of TqXC") tends to that of TaX. 
The argument of Part 2) in Theorem 1 of [131 shows that 
H{TaX^'''>) H{TaX) as fc oo (the basic argument is 
to apply Fatou's lemma twice). Because f ~^(.) is continuous, 
we have ViT^X^'"'^) V"(T„X) as fc ^ oo. Thus Theorem 
12.41 holds by taking a limit on the finite support result. ■ 



The second term of the Hessian, can be explicitly 
evaluated by writing 0{t) = t ~ tlogt and expressing 

= -^4^- (25) 

2^k=l O^kAk 

We now examine the Hessian $" in more detail, using 
techniques that extend the proof of Theorem 12.31 given in ||24| . 
first introducing a sufficient condition. 

Condition 1: We say that vectors fi and /3 satisfy the 
positive splitting condition if there exist positive Uij such that 

1) For all i,j the terms 



Appendix B 
Proof of monotonicity TheoremI3.2I 



In this section, we prove monotonicity of entropy by 
analysing certain directional derivatives of an 'energy' func- 
tional A. For X with expectation Xx, we write A{X) ~ 
-ElogUxxiX) = Xx+ElogX\-Xx logAx.Inthis section, 
we will establish the following proposition: 

Proposition B.l: Given positive ai such that '^^'^=1 — 1^ 
and writing a*^'' — X^i^^z '^i^ then for any independent ULC 



nA 



/'n+l 

E 

\i=i 



n+l 



) > ^a(')A I Y^T^JainX, I . (23) 



1=1 



As in Il24l . Lemma [3T| can be subtracted from Proposition 
IB. II to deduce that Theorem 13.21 holds. We will write a = 
(ai, . . . , a„+i) and given independent ULC Xi with means 
A, we will define the function = A (^,"=1^ T^^.X,^ . We 



write 



(Ta-^Xi — Xi, . . . , Tct^^^Xn+i — Xn+l) 



and Qa{s) = Ex j] x-=s ^a{^)- In order to establish Propo- 
sition IB. II we will need to understand the properties of the 
Hessian matrix which we write as the sum of two matrices 
$" = + The first matrix. 



6)2 



da id a j ^ 



.(s)logs!, 



can be evaluated using Equation ( fTSI l - we omit the details for 
brevity: 

Lemma B.2: For any a, i and j the derivative 



s=0 x:J2i x^-- 



s- 1 



(24) 



2) For all j the terms (^X^i^^j ^u) /iPj-^j) ^^^l^^ same 
value, S say. 

Observe that if Condition [T] holds, then multiplying the terms 
in Part 2. by /3jXj and summing over j we deduce that 



so that 




iirfE4^(/5^^0 + EA^«/^.^'^.) 



(26) 



This property allows us to deduce the following result: 

Theorem B.3: If fi and /3 satisfy the positive splitting 
condition. Condition [T] then fi'^^"{l3)fi < 0. 

Proof: We use Lemma |BT2] to deduce that, writing for 
the ith unit vector, s — Xi and x^*'^) = x — e^, then we 



g 



can express the product fi^^'{{f3)fi as 




oo 

E 

s=0 



s)s log 



< 



(28) 



(29) 



(30) 



Here Equation dZTl i follows by comparing coefficients of XiXj, 
using Part 1. of Condition [1] Equation ( |28] l follows as in 
||24 |. using Chebyshev's rearrangement lemma, and the fact 
that {xi + w) log((a;i + w)/{xi + w — 1)) is increasing in Xi 
and log((xi + w) / {xi + w — 1) is decreasing in Xi (coupled 
with the assumption that Uij > 0). Equation ( |29] l uses Part 2. 
of Condition [T] Equation (l30l l follows using ( |26] | since, as in 
Il24l . slog((s + l)/s) < 1. Finally we use the expression for 
$2 given in Equation (IZSl l. ■ 
We can use this result to complete the proof of mono tonicity 
of entropy. Theorem 13.21 by proving Proposition IB. II 

Proof of Proposition \B.1\ For each I, we can define a 
one-parameter map which interpolates between the values of 
a. That is, for each I, define 

Ai{t) = (l-t)a(') +tei, 

where a^'^ = (ai, . . . , a;_i, 0, a;+i, . . . , a„)/a('^ is the 
renormalized 'leave one out' vector, and e; is the Ith unit 
vector. We write = ei — a''' = ^A;(<). Observe that 
Ai(0) = q:'^'-' and A; (a;) = a, meaning that by Taylor's 
theorem, for some t;* G [0, ai], if the relevant Hessian term is 
negative. 



a,/x/$'(a) + ^M/a>"(A,(ir))/i, 

(31) 



If this is true for each Z, on multiplying by a'^'^ and sum- 
ming over I we deduce that Er=/ < 
and the proof is complete. (This uses the property that 
a'-'-'aj/i.; = 0, which is a consequence of the fact 
that Y,i a^'^aia*^'^ = X); • ■ • > "i-i, 0, . ..,«„) = 

{aia'^^\ . . . , a„+ia^"+^^) = J2i cn'^^^aiei, as required). 

We complete the proof by checking the negativity of the 
relevant Hessians by testing positive splitting. Condition!!] and 
applying Theorem IB. 31 There are considerable simplifications 



in this case, since the majority of the values of Vij{Ai{t^), /X;) 
vanish. That is, if i,j ^ I then for any t the Vij{Ai{t), fii) 
becomes 



v(0 



aj/a 



(0 



^a.(l-t)/a(0 a,{l-t)/aiy a.{t)a,{t)X.A, ^ 0. 
In the remaining case, when i ^ I and j ~ I, the vu (A; (t), /X;) 



IS 



ai/a 



(0 



^{l-t)/aW t 



a^{l-t) 



tx,x, 



aiXjX 



(32) 



We can exhibit a set of positive solutions to the required equa- 
tions by writing X{t) = Y,. ai{t)Xi, A(''(t) = = 
X{t) - tXi and S = {X^^Ht)Xi)/{t{l - t)^X{t)). Then define 
Uij to be zero unless i or j equals /, in which case for i ^ I, 



SaiXt{l-t) 
uii = and Uii 



(33) 



We confirm that this choice of u satisfies Condition[T]- firstly 
clearly these terms are positive. Secondly for all i ^ I, the 

sum 



Uli 



- Uil 



a,; A, 



ay 



(') \t{l-t) 



xm~t), 



Finally, for u as defined in (|33] |. writing Aij{t) for the 
jth component of Ai{t), the sums '^ij / 

indeed equal S for each j. Specifically, for j ^ I there is only 
non-zero term in the sum, giving 7i;j/(A; j(t)Aj ) = 5, since 



sum becomes 



, (1 - t)/Q!(''. For i = I, since Ai^^{t) = t, the 



Uil 



Ai.At)Xi til~t)aWX{t) 

as required. Hence Condition [T] holds in this case, so we can 
apply Theorem |B3] to deduce that /xf $"(A;(t))/X; < for 
all t. This means that ( |3T] ) holds for each I, and the proof of 
Proposition IB. II is complete. ■ 
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